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Under a single-index regression assumption, we introduce a new semiparametric proce- 
dure to estimate a conditional density of a censored response. The regression model can be 
seen as a generalization of Cox regression model and also as a profitable tool to perform 
dimension reduction under censoring. This technique extends the results of Delecroix et 



(N 
> 

oo ■ al. (2003). We derive consistency and asymptotic normality of our estimator of the index 



parameter by proving its asymptotic equivalence with the (uncomputable) maximum likeli- 
hood estimator, using martingales results for counting processes and arguments of empirical 
processes theory. Furthermore, we provide a new adaptive procedure which allows us both 
to chose the smoothing parameter involved in our approach and to circumvent the weak per- 
formances of Kaplan-Meier estimator (1958) in the right-tail of the distribution. Through 
a simulation study, we study the behavior of our estimator for small samples. 

Keywords: asymptotic normality; empirical processes; censoring; martingales for counting 
processes; pseudo-maximum Hkelihood; single-index model 

1 Introduction 

A major issue of recent papers dealing with censored regression is to propose alternatives to 
the popular Cox regression model. This model, also known as multiplicative hazard regression 



model (see lCoxl p972l )). states some semiparametric model on the conditional hazard function. 
Estimation in this model is traditionally performed using pseudolikelihood techniques, and the 



theoretical properties of these procedures are covered by a large number of papers (see e.g. 



Fleming and HarringtonI (Il99ll )). However, in some situations, the assumptions of Cox regres- 
sion model are obviously not satisfied by the data set. In this paper, our aim is to perform 
estimation in a semiparametric regression model which allows more flexibility than the Cox re- 
gression model. This new technique can be seen as a particularly interesting alternative, since 
it is valid in a larger number of situations than the multiplicative hazard model. 

Alternatives to Cox regression model mostly focus on the e stimation of a conditional ex- 



Stute 



1999I ). Delecroix et al. 



pectation, or of a quantile regression model. Koul et al. (1981), 

consider mean-regression models where the regression function belongs to a parametric family, 

but with an unknown distribution of the r esiduals. Para r aetric qua ntile regressio n was stud- 



ied by Gannoun et al. On the other hand. 



Lu and Burkel (120051 1 and 



Lopez 



(120081) considered 



a semiparametric single-index regression model. Single-index regression models were initially 
introduce d to circunivent the so-called "curse of dimensionality" in nonparametric regression 



see. e. 



Ichimura 



(|1993l )). by assuming that the conditional expectation only depends on an 
unknown linear combination of the covariates. Another appealing aspect of such kind of models 
is that they include the Cox regression model as a particular case. The main assumption of this 
model is that the conditional density only depends on an unknown linear combination of the 
covariates, while the multiplicative hazard model states a similar assumption on the conditional 
hazard rate. In this paper we focus on estimation of the parameter in a regression model in 
which the conditional density of the response satisfies a single-index assumption. We provide 
asymptotic results for a new M-estimation procedure for the index parameter. This procedure 
can be seen as a generalization of the method of Delecroix et al. (2003) to the case of censored 
regression. 

As in the uncensored case, we show that, regarding to the estimation of the parametric part 
of our model, there is an asymptotic equivalence between our semiparametric approach and 
a parametric one relying on some prior knowledge on the family of regression functions. For 
the nonparametric part, we use kernel estimators of conditional densities as in Delecroix et al. 
(2003). Since the performance of kernel estimators strongly relies on the choice of the smoothing 
parameter, we also provide a method to choose this parameter adaptively. Another technical 
issue in our approach concerns a truncation parameter involved in our procedure. This problem 



of truncation directly comes from the censored framework, where estimators of the underlying 
distribution functions sometimes fail to estimate correctly the tail of the distribution. This 
problem is traditionally circumvented for ex ample by ass uming integrability assumptions on the 



response and censoring distribution, see e.g. 



Stutd (JI999I ). On the other hand, truncation proce- 



dure consists of r emoving the observations which are to o large in the estimati on of the regression 



functi on, see e.g. 



Heuchenne and Van KeilegomI (|2007l ). or condition (2.2) in 



Brunei and Comte 



20061 ) which can be interpreted as such kind of truncation. Until now, the truncation bounds 
which were used were arbitrary fixed, and usually no method is proposed to discuss a method 
for choosing this truncation bound in practical situations. Therefore, in the new method we 
propose, we also provide a data-driven procedure to choose the truncation parameter. In our 
practical implementations, we used a criterion based on an asymptotic discussion which focuses 
on the mean-squared error associated with the estimation of the single index parameter. We also 
suggest some possible adaptations to other type of criterion which are covered by our theoretical 
results. 

In section [2l we introduce our censored regression model and present our estimation pro- 
cedure. It relies on the Kaplan-Meier estimator (1958) of the distribution function, and on 
semiparametric estimators of the conditional density. Following the procedure of Delecroix et 
al. (2003), we considered kernel based estimators. Our theoretical results are presented in sec- 
tion [3l In section [4] we report simulation results and analysis on real data. Section [5] contains 
the detailed proof of our Main Lemma which states the asymptotic equivalence of estimating the 
parameter in the semiparametric and parametric models. All the technicalities are postponed 
to the section [71 



2 Censored regression model and estimation procedure 

2.1 Notations and general setting 

Let Yi, . . . , y„ be i.i.d. copies of a random response variable F G M, and let Xi, . . . , Xn be i.i.d. 
copies of a random vector of covariates X ^ X, where A' is a compact subset of R"'. Introducing 
Ci,. . . ,Cn i.i.d. replications of the censoring variable C G M, we consider the following censored 



regression framework, where the observations are 

Zi = YiKCi 1 < i < n, 

Xi^X CW^ l<i<n. 

Let us introduce some notations for the distribution functions of the random variables appearing 
in this model, that is H{t) = P(Z < t), Fx{t) = P(X < t), Fyit) = F{Y < t), Fx,Y{x,y) = 
¥[X < x,Y < y) and G{t) = P(C < t). A major difficulty arising in censored regression models 
stands in the unavailability of the empirical distribution function to estimate functions Fy, Fx,y 
and G, which must be replaced by Kaplan-Meier estimators. 

We are interested in estimating f{y\x), where f{y\x) denotes the conditional density of Y 
given X = X evaluated at point y. If one has no insight on the function /, it becomes necessary to 
perform nonparametric estimation of the condition al densitv- In absence of censoring , a classical 



Bashtannyk and HyndmanI |200ll ) . However 



way to proceed is to use kernel smoothing, see e.g. 
the so-called "curse of dimensionality" prevents this approach from being of practical interest 
when the number of covariates is important (d > 3 in practice). Therefore it becomes relevant to 
consider semiparametric models which appear to be a good compromise between the parametric 
(which relies on strong assumptions on the function / which may not hold in practice) and the 
nonparametric approach (which relies on fewer assumptions). In the following, we will consider 
the following semiparametric single-index regression model, 

3^0 G e C M'^ s.a. f{y\x) = fe,{y,xe^), (1) 

where fg{y,u) denotes the conditional density of Y given X'9 = u evaluated at y. For identi- 
fiability reasons, we will impose that the first component of 6*0 is one. In comparison to Cox 
regression model for absolute continuous variables, our model ([T]) is more general, since it only 
assumes that the law of Y given X depends on an unknown linear combination of the covariates, 
without imposing additional conditions on the conditional hazard rate. 

Model ([1]) has been considered by Delecroix et al. (2003) in the uncensored case. How- 
ever, their procedure can not be directly applied in the censored framework since the responses 
variables are not directly observed. As a consequence, the empirical distribution function is un- 
available, and most of the tools used in this context are not at our disposal. A solution consists of 
using procedures relying on Kaplan-Meier estimators for the distribution function. An important 



difficulty arising in this type of techniques stands in the poor behavior of Kaplan-Meier estima- 
tors in the tail of the distribution. A practical way to prevent us from this kind of drawback is 
to consider truncated version of the variable Y. In the following, we will consider Aj- a sequence 
of compacts included in the set {t : ri < t < r}, for r < tq, where tq < inf{t : H{t) = 1}. Using 
only the observations in A-r allow us to avoid the bad behavior of usual Kaplan-Meier estimators 
in the tail of the distribution. Moreover, this technique of truncation is particularly adapted to 
our problem of estimating Oq. In our framework, this truncation does not lead to any asymptotic 
bias, since, denoting by /"^(-jx) the conditional density of Y given X = x and Y £ A^-, for any 
T < oo, we have, under ([T]), 

r{y\x) = n,{y,x'eo), (2) 

where fg{y,u) denotes the conditional density of Y given X'9 = u and Y £ Ar evaluated at 
y, and where the parameter is the same in ([1|) as in ([2]). In section \2M we will discuss a new 
method allowing to choose r from the data in order to improve the performance in estimating 

2.2 Estimation procedure 

We will extend the idea behind the procedure developed by Delecroix et al. (2003), adapting 
it to our censored framework. First assume that we know the family of functions /^ . This 
approach is a modification of the maximum likelihood estimation procedure. Define, for any 
function J > 0, 

U{e,J) = E [log rs{Y,,e'Xi) J {Xi)lY^^A^ = J log fS{y,9'x)J{x)ly^A.dFx,Y{x,y). 

Here, J is a positive trimming function which will be defined later in order to avoid denominators 
problems in the nonparametric part of the model, see section [2!4l From ([2]), ^o maximizes 
L'^{6, J) for any r < oo, this maximum being unique under some additional conditions on the 
regression model and J. Since, in our framework, Fx^ ^iid fg are unknown, it is natural to 
estimate them in order to produce an empirical version of L^{0, J). 

2.2.1 Estimation of Fx,y 

In the case where there is no censoring (as in Delecroix et al. (2003)), Fxx can be estimated 
by the empirical distribution function. In our censoring framework, the empirical distribution 
function of {X, Y) is unavailable, since it relies on the true Y-s which are not observed. A 



convenient way t o proc e ed con sists of replacing it by some Kaplan-Meier est imator such as the 



one p roposed by 



Stute 



19931 ). Let us define the Kaplan-Meier estimator (JKaplan and Meier 



(119581)) of Fy, 



Fy{y) = l- n fl-V^^^ 1 

iz<t\ ^,=i^z,>z^) 

n 



S^ 



i=l 



where Win denotes the "jump" of Kaplan-Meier estimator at observation i (see 
To estimate Fxy-, Stute proposes to use 



Stute 



19931)). 



F{x,y) = y^^5iWinlz,<y,x,<x- 
j=l 

Let us also define the following (uncomputable) estimator of the distribution function, 

n 

F{x,y) = ^diW*lzi<y,x,<x, 



i=l 



where W^ = n ^[1 — G{Zi — )] ^. The link between F and F comes from the fact that, in the 
case where F{Y = C) = 0, 

Win = n-^[l - G{Zi-)]-\ (3) 



Satten and Dattal (|200ll )). Asymptotic 



where G denotes the Kaplan-Meier estimator of G (see 

properties of F can be deduced from studying the difference with the simplest but uncomputable 

estimator F. 

If we know the family of regression functions fg, it is possible to compute the empirical 
version of L'^{9, J) using F, that is 

Ll{9,r,J) = j\ogre{y,e'x)J{x)ty^A.dF{x,y) 

n 

= ^6iW^niogfS{Zi,e'Xi)j{x,)iz,eA.. 

1=1 
In the case J = 1, the estimator of ^o obtained by maximizing L^ would turn out to be an 
extension of the maximum likelihood estimator of ^O) used in presence of censoring. 

2.3 Estimation of fg 

In our regression model ([2|), the family {fg,9 G 0} is actually unknown. As in Delecroix et al. 
(2003), we propose to use nonparametric kernel smoothing to estimate fg . Introducing a kernel 



function K and a sequence of bandwidths h, define 

fh,rr^ Q>. ^ I KhjO'x - e'u)Kh{z - y)ly^AjFiu, y) 
' jKh{e'x-e'u)ly^A.dF{u,y) 

where Kh{-) = h^^K{-/h). Also define f*^''^ the kernel estimator based on function F, that is 



,*h,r,^ n,^, _ fKh{9'x-6'u)Kh{z-y)lyeA.dF{u,y) 



fKh{d'x-e'u)ly^ArdF{u,y) 

f*h^'r ^\\\ play an important role in studying the asymptotic behavior of f^''^- Indeed, f*^^'^ is 
theoretically more easy to handle with, since it relies on sums of i.i.d. quantities, which is not 
the case for F. Since f*'^''^ can be studied by standard kernel arguments, the most important 
difficulty will arise from studying the difference between /^'"^ and /*'*''^. 

In the following, we will impose the conditions below on the kernel function. 

Assumption 1. Assume that 

(Al) K is a twice differentiable and four order kernel with derivatives of order 0, 1 and 2 of 
bounded variation. Its support is contained in [—1/2, 1/2] and J^K{s)ds = 1, 

(A2) ||K||oo := sup3.gR \K{x)\ < 00, 

(A3) K, := {K(^{x — ■)/h) : h > 0,x € M.'^} is a pointwise measurable class of functions, 

(A4) h G Tin C [an~",6n~°] with a, 6 G M, 1/8 < a < 1/6 and where Tin is of cardinality kn 
satisfying knn~^°' -^ 0. 

2.4 The trimming function J 

The reason behind introducing function J has to be connected with the need to prevent us from 
denominators close to zero in the definition ([!]). Ideally, we would need to use the following 
trimming function, 

Jo(x,c) = J{fg>^x,9oX,c), (5) 

where c is a strictly positive constant, fg'x denotes the density of 9qX and J{g, u, c) = l3(u)>c- 
Unfortunately, this function relies on the knowledge of parameter ^o and fe'x- Therefore, we 
will have to proceed in two steps, that is first obtain a preliminary consistent estimator of ^O) 
and then use it to estimate the trimming function Jq which will be needed to achieve asymptotic 
normality of our estimators of ^o- 



We will assume that we know some set B on which mf{fQ'x{0'x) : x £ B,6 £ Q} > c, 
where c is a strictly positive constant. In a preliminary step, we can use this set B to compute 
the preliminary trimming Jb{x) = l^gs- Using this trimming function, and a deterministic 
sequence of bandwidth /iq satisfying (^44) in Assumption [H we define a preliminary estimator 

On of ^0) 

0„ = argminL;(e,/^«'^JB). (6) 

Let us stress the fact that B is assumed to be known by the statistician. This is a classical 
assumption in single-index regression (see Delecroix et al., (2006)). However, in practice, the 
procedure does not seem very sensitive to the choice of B. The bandwidth ho we consider in the 
preliminary step can be any sequence decreasing to zero slower than n~^". Adaptive choice of 
/iQ could be considered (using, for instance, the same choice as in the final estimation step, see 
below). However, since we will only need On to be a preliminary consistent estimator, and the 
final estimator will not be very sensitive to an adaptive choice of ho while computing On, we do 
not consider this case in the following. 

With, at hand, this preliminary estimator On, we can compute an estimated version of Jq 
which will happen to be equivalent to Jq (see Delecroix et al. (2006) page 738), that is 

Mx,c) = J{f;;,^'^,0'^x,c). (7) 

For each sequence of bandwidths satisfying (A4) in Assumption [H and for each truncation 
bound T, we can define an estimator of ^o 

0^h) = aigma^Ll{0,f'''\Jo), (8) 

6»ee„ 

where 0„ is a shrinking sequence of neighborhoods accordingly to the preliminary estimation. 
However, as for any smoothing approach, the performance of this procedure strongly depends 
on the bandwidth sequence. Therefore it becomes particularly relevant to provide an approach 
which automatically selects the most adapted bandwidth according to the data. Then, the new 
question arising from the censored framework comes from the adaptive choice of the truncation 
parameter r. 

2.5 Adaptive choice of the bandwidth 

Our procedure consists of choosing from the data, for each 0, a bandwidth which is adapted to 
the computation of fg{z,u). For this, we use an adaptation of the cross-validation technique of 



Fan and YimI |2004l l. that is 



This criterion is (up to a quantity which does not depend on h) an empirical version of the ISE cri- 
terion defined in (3.3) in lFan and YimI (|2004l ) (in a censored framework), that is /^ /{/g '^(-2, O'x)- 
re{z,e'x)Yfg,x{0'x)dxdz. 

The estimator of ^o with an adaptive bandwidth is now defined as 



r = argmaxL;(0,/^'^Jo). 



(9) 



In the above notation, h depends on 9 and r, which was not emphasized to shorten the notation. 

2.6 Adaptive choice of r 

As we aheady mentioned, the Kaplan-Meier estimator does not behave well in the tail of the 
distribution. For example, if some moment conditions are not satisfied, it is not even n^'^- 
consistent. Moreover, even in the case where an appropriate moment condition holds, it may 
happen (at least for a finite sample size) that the weights corresponding to the large observations 
are too important and considerably influence the estimation procedure. For this reason, we 
introduced a truncation by a bound r. However, a large number of existing procedure which 
also rely on such kind of truncation do not consider the problem of choosing r from the data. 
We propose to select r from the data in the following way. Suppose that we have a consistent 
estimator of the asymptotic mean squared error, 

E'^{t) = limsup^ \W{h'') - 9^ 



say E'^{t) satisfying 



sup \E^{t) - E^{t)\ -^ 0, in probability. 

ri<r<ro 



(10) 



Such an estimator will be proposed in section [H Using this empirical estimator, we propose to 
choose r in the following way, that is 



f = argminE' (r). 

ri<r<r() 

Our final estimator of ^o is based on an adaptive bandwidth and an adaptive choice of 
truncation parameter r, that is 



As we already said, truncating the data does not introduce additional bias in the estimation 
of ^o- On the other hand, removing too many data points could strongly increase the variance 
and removing some of the largest data points will decrease it. Then, our selection procedure 
f is based on estimating the variance of 9 and consists of taking from the data the truncation 
parameter r that seems to be the best compromise between these two aspects. 

3 Asymptotic results 

3.1 Consistency 

The assumptions needed for consistency can basically be split into three categories, that is iden- 
tifiability assumptions, assumptions on the regression model ^ itself and finally assumptions 
on the censoring model. 

Identifiability assumption and assumption on the regression model. 

Assumption 2. Assume that for all n < r < tq and all 9 £ Q — {9o}, 

Lr{9o,JB)-Lr{9,JB)>0. 

Assumption 3. Assume that for 61,62 G 0, for a bounded function $(X) and for some 7 > 0, 
we have 

snpm,{y,6[x)-f^^{y,6'2x)\\^ < ||0i - ^alP^W. 

r 

Assumptions on the censoring model. 

Assumption 4. F{Y = C) = 0. 

This classical assumption in a censored framework avoids problems caused by the lack of 
symmetry between C and Y in the case where there are ties. 

Assumption 5. Identifiability assumption : we assume that 

- Y and C are independent. 

- P(F < C\X,Y) = p(y < C\Y). 



Stute 



I993I ). An important particular case 



This last assumption was initially introduced by 
in which assumption [5] holds is when C is independent from {X,Y). However, assumption [5] is a 
more general and widely accepted assumption, which allows the censoring variables to depend 
on the covariates. 

10 



Theorem 1. Under Assumptions\^ to\^ 



and consequently, 



sup \Ll{e, /^0'^ Jb) - L-{e, Jb)\ = op(l), 
eeQ,Ti<T<To 



Bo. 



(11) 



Proof. To show (fTTjl . we will proceed in two steps. First we consider -LJ^(0, /^, Jb) — L^{G, Jb) 
(parametric problem), and then -L,^(^, f^"''^, Jb) — Ln{B, f^, Jb)- 

Step 1. From Assumption [3l the family jloRJ fnj-, 6' • )), 6 £ &, ti <t < tq} is seen to be 



Stutd ll9M) leads to supg |L;(6', /^, J^) 



P— Glivenko-Cantelli. Using an uniform version of 

L-{e,jB)\ ^fO. 

Step 2. We have, on the set @'B, 

|log/,"«'^(y,7x) - log/e^(y,^)| < c-'[f^'"^{y,u) - fUy,u)]. 
Hence, 
sup\Ll{e,f''°'-,JB)-K{e,r,JB)\<c-^ sup |/^'"(y,u)-/,"(y,u)|l„e0'B,y<r fdF{x,y) 

9,T 9,y,u,T J 

< c"^ sup \fg°'^{y,u) - fQ{y,u)\lue&'B,y<T■ 
e,y,u,T 

Using the uniform convergence of /^'^ (see Proposition [6] and Lemma [7|, deduce that 

sup,,, \Ll{e, /'^".-, Jb) - K{e, r, Jb)\ ^p 0. D 

3.2 Asymptotic normality 

To obtain the asymptotic normality of our estimator, we need to add some regularity assumptions 
on the regression model. 

Assumption 6. Denote by Ve/^(y,x) (resp. V'^fg{y,x)) the vector of partial derivatives (resp. 
the matrix of second derivatives with respect to 9) of fg with respect to 6 and computed at point 
{6, X, y). Assume that for ^i, ^2 G ©? for a bounded function ^{X) and for some 7 > 0, we have 

snp\\VlfS,{y,x)-VlfS,{y,x)\\oo < H^i - 02ir$(X). 

T 

Assumption 7. Using the notation of \ Van der Vaart and Wellnen 1(1 99a) in section 2. 7, define 



ni = c^+\e'QXxAr,M), 

7i2 = xC^+^e'^X X Ar,M)+C^+\e'QX X Ar,M) 
Assume that fg {■, ■) G "Hi (as a function of 9qX and y) and Vg/g (•, •) G Ti.2- 



11 



If the family of functions f^ was known (parametric problem) , the asymptotic normality of 
9 could be deduced from elementar y resu l ts on Kaplan-Meier integrals (see section [7] for some 



brief review of these results), as in 



Stutd (119991) or in Delecroix et al. (2008). Using this kind 



of results, we can derive the following Lemma (see section [7] for the proof) which is sufficient 



to obt ain the asymptotic law of 9 in the parametric case, from Theorem 1 and 2 of 



Sherman 



{mi) 



Lemma 2. Under Assumptions\^ and\% we have the following representations: 

1. on op{l)— neighborhoods of 9q, 

Ll{9, r, ^o) = L^9, Jo) + {e- 9^yTin{0) + {9- eo)'T2n{9){9 - Oq) + T3n{9) + nn{9o), 

with supg,- \Tin\ = Op{n-^/'^), supg,^ \T2n\ = op(l), supg.^ \T^n\ = Op{n-^) andTin{9o) = 
Ln{9o,f'^,Jo)- 

2. on Op{n~^i^)— neighborhoods of 9q, 

Ll{9, f\ Jo) = n-i/2(^ - 9o)'Wn,r - \{9 - 9a)'Vr{9 - 9o) + Tirr{9o) + nn{9), 

with supe^^ iTsnl = op{n-^), and defining fi{x,y) = f^~\y,9'Qx)Jo{x,c)Vef^^{y,x), 

1 " 



n^ 



i=l 



Vr = E [fS;"{Y,9',X)MX,c)VgfS^^{Y,X)VefSoiY,xyiY^A. 
where ip is defined in Theorem^ 

In the following Theorem, we show that the semiparametric estimator proposed in section [2] 
has the same asymptotic law as in the fully parametric case. 

Theorem 3. Define t* = argmin^ £'^(r). Under Assumptions U\ to\% we have the following 
asymptotic i.i.d. representation, 






fl2) 



where Vr and Wn,T are defined in LemmalM As a consequence, 

l'/\9 -9o)^'J^ {0,^r') 



n 



where S,-. = T/_, At-*(/i)1/_, , At-*(/i) = Var {%1:{Z,5,X] filiA^*)) and /i is defined in Lemma 

m 
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This Theorem is a consequence of the Main Lemma below. This result shows that, asymp- 
totically speaking, maximizing L'!^{0, f^''^, J) is equivalent to maximizing L'^{9, f^, J). 

Main Lemma 1. Under Assumptions[^to\^ 

Li{e, /'^'^ Jo) = Li{e, r, Jo) + {6- eoYRmie, h, t) + {9- e^)'R2n{o, K T){e - 9o) + l;xoo), 



where 



sup Rin{0,h,T) = op{n 

eee„,/iGW„,ri<r<ro 

sup R2n{0,h,T) = Op(l). 

6'Gen,/ieW„,ri<T<ro 



-1/2N 



and 

^n(^o) = ^ln(^0, / '^) - Bln{9o, f '^) 

where A'[^{6o, / '^) and BI^{6q, / ''^) are defined in the proof of this Lemma. 



In view of Theorem 1 and 2 of 



ShermanI (119941 ) . this result will allow us to obtain the rate of 



convergence of our estimators, and then the asymptotic law is the same law as the asymptotic 
law in the parametric problem. 

Proof of Theorem 0. Define 



Ton{e,T,h) = L;xo,f''^\Jo), 
r2ni9) = LUe,f''\Jo)- 

We now apply Theorem 1 and 2 in Sherman (1994) to Tin, for i = 0, 1, 2. From our Main 



and Lemma [21 we deduce, that the representation (11) in Theorem 2 of 

for i = 0, 1,2, on Op{n~^''^)— neighborhoods of 6*0, with Wn and V defined i n Lemma 



jcmma 



ShermanI (119941 ) holds 

^ The 



asymptotic representation (|T2|) is a by-product of the proof of Theorem 2 in 
and of the i.i.d. representations of Kaplan-Meier integrals (see Theorem H]). 



Sherman 



imi) 



D 



4 Simulation study and real data analysis 

4.1 Practical implementation of the adaptive choice of r 

Prom the proof of Theorem [31 we have the representation 

1 " 

n ^ — ^ 



i=l 
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As in lStutd |l995l ). the function tp of Theorem [4] can be estimated from the data in the following 
way by 



1 - G{Z-) J 1 - H{y) 

where /i is our kernel estimator of /i and H is the empi rical estimato r of H . To consistently 



estimate A(/i), we use the general technique proposed by 



Stutd p996l ). that is 

2 



(13) 



Ar(/i) = -V V^(Zi,,5„X,;/i)--V^(Z,,,5„X,;/i) 
n ^-^ n ^-^ 

where 02 denotes the product of the matrix with its transpose. A consistent estimator of Vr 
can then be computed as 



To estimate the asymptotic mean squared error we use 

n 

4.2 Simulation study 

In order to check the finite sample behavior of our estimators of Oq, we conducted some simula- 
tions using a similar model as the one in Delecroix et al. (2003). We considered the following 
regression model, 

Yi = e'^Xi + ej, z = 1,... ,n 

where Yi G M, 6*0 = (1,0.5,1.4,0.2)' and Xi ~ (8)4{0.27^(0, 1) +0.87^(0.25,2)}. The errors are 
centered and normally distributed with conditional variance equal to |^o"'^|. We used the kernel 

K{u) = 2k{u) -k* k{u) 

where * denotes the convolution product and 

fc(n) = ^(l-n2)li„|<i 

is the classical Epanechnikov kernel. The censoring distribution was selected to be exponential 
with parameter A which allows us to fix the proportion of censored responses {p = 25% and 
p = 40% in our simulations), h was chosen using a regular grid between 1 and 1.5. 

Our estimator 9'^ was compared with two other estimators, that is 0°° which does not rely 
on an adaptive choice of r, and 6^^^ which is obtained using the average derivative method 
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of ILu and Burkd (|2005l ). In the tables below we report our results over 100 simulations from 
samples of size 100 and 200 for two different rates of censoring. Recalling that the first component 
of ^0 is imposed to be one, we only have to estimate the three other components. For each 
estimator, the Mean Squared Error E{\\6 — ^olP) is decomposed into bias and covariance. 



p = 25%, n = 100 


Bias 


Variance 


MSE 


QADE 


/ -0.112 \ 

-0.551 
\ -0.155 / 




/ 0.14 0.005 -0.022 \ 

0.005 0.075 0.016 
l^ -0.022 0.016 0.116 ) 




0.6714181 


QOO 




/ 0.057 ^ 

0.215 
\ 0.048 / 






( 0.033 0.012 0.001 ^ 

0.012 0.073 -0.004 
l^ 0.001 -0.004 0.027 ) 




0.1841227 


r 




/ 0.07 ^ 

0.221 
\ 0.028 / 






( 0.034 0.002 0.002 ^ 

0.002 0.074 
l^ 0.002 0.02 ) 




0.1825980 



Table 1: Biases, variances and mean squared errors for 25% of censoring and sampling of size 
100. 



p = 40%,n= 100 


Bias 


Variance 


MSE 




/ -0.334 ' 




( 0.159 


0.009 


-0.014 \ 






gADE 




-0.743 






0.009 


0.268 


0.048 




1.280163 




\ -0.158 ^ 




\^ -0.014 


0.048 


0.165 / 








1 0.127 ^ 




/ 0.11 


-0.034 


-0.01 \ 






QOO 




0.296 






-0.034 


0.101 


0.021 




0.3829797 




\^ 0.096 ] 


\ -0.01 


0.021 


0.059 ) 








1 0.074 ^ 




/ 0.064 


-0.005 


-0.004 ^ 






9^ 




0.176 






-0.005 


0.051 


0.014 




0.2239023 




\^ 0.061 ) 




\ -0.004 


0.014 


0.069 ^ 







Table 2: Biases, variances and mean squared errors for 40% of censoring and sampling of size 
100. 
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p = 25%,n = 200 


Bias 


Variance 


MSE 


qADE 


/ -0.189 \ 

-0.578 
\ -0.133 / 




^ 0.096 0.003 0.006 ^ 

0.003 0.148 -0.016 
l^ 0.006 -0.016 0.131 j 




0.7620268 


QOO 




/ 0.073 ^ 

0.133 
\ 0.015 J 






^ 0.033 0.004 -0.004 ^ 

0.004 0.023 0.002 
l^ -0.004 0.002 0.012 y 




0.0910719 


r 




' 0.034 ^ 

0.107 
\ 0.014 1 






' 0.007 0.001 0.004 ^ 

0.001 0.011 
l^ 0.006 1 




0.0364064 



Table 3: Biases, variances and mean squared errors for 25% of censoring and sampling of size 
200. 



p = 40%,n = 200 


Bias 


Variance 


MSE 


qADE 




' -0.109 ' 

-0.763 
, -0.053 


/ 




( 0.146 -0.02 0.056 ^ 

-0.02 0.143 -0.014 
l^ 0.056 -0.014 0.192 ) 




1.078027 


QOO 




' 0.104 ^ 

0.151 
\ 0.077 / 






/ 0.109 0.008 0.042 ^ 

0.008 0.049 0.003 
i^ 0.042 0.003 0.055 ) 




0.2521227 


r 




/ 0.043 ^ 

0.14 
\ 0.021 / 






( 0.018 -0.001 0.002 ^ 

-0.001 0.022 0.002 
\ 0.002 0.002 0.014 / 




0.07533921 



Table 4: Biases, variances and mean squared errors for 40% of censoring and sampling of size 
200. 

To give a precise idea of the number of observations which are removed from the study 
by choosing r adaptively, introduce N = {[{1 < i < n, Zi < f}. In the following table EJ we 
evaluated E[N] in the different cases we considered in the simulation study. We also mention 
the average weight allocated to the largest (uncensored) data point, first in the case where we 
consider the whole data set (we denote it Weight°°), then in the truncated data set where we 
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removed all data points with Zi>f (we denote it Weight"^). 





t{N) 


Weight°° 


Weighf^ 


n = 100,p = 25% 


90 


0.0667 


0.0204 


n = 100,p = 40% 


87 


0.124 


0.0236 


n = 200,p = 25% 


185 


0.0402 


0.0119 


n = 200,p = 40% 


172 


0.0997 


0.0122 



Table 5: Last observed data in the truncating model and weight allocated to the largest obser- 
vation in each model for different sample sizes and censoring rates. 

Clearly the MSE deteriorates when the percentage of censoring increases. According to the 
simulations, 6'^ and 6°° outperform 6 , while, as expected, choosing adaptively r improves 
the quality of the estimation. This is not obvious in the case where there are only 25% of 
censoring. However, in the case where the level of censoring is high, estimation of the tail 
of the distribution by Kaplan-Meier estimators becomes more erratic, and the importance of 
choosing a proper truncation appears in the significant difference between the MSE of 9'^ and 
6°°. Moreover, the importance of truncation becomes obvious if we look at table[5l We see that, 
in the case where there is 40 % of censoring, the weight allocated to the largest data-point if we 
do not use truncation can be up to 10 times (approximatively) the weight allocated to the largest 
observation in the truncated data set. The ratio is less important in the case where there is 25 
% of censoring, but still consequent (in this case, the ratio is approximatively 3). Therefore, 
it seems that, considering the whole data set, the weight allocated to the largest observation 
can have a too strong influence on the estimation procedure, which explains the difference of 
performance of the estimators with or without truncation. 



4.3 Example : Stanford Heart Transplant Data 

We now illustrate our method using data from the Stanf ord Heart Transplant program. This 



data set was initially studied by 



Miller and HalpernI (|l982l ). 184 of 249 patients in this program 



received a heart transplantation between October 1967 and February 1980. From this data, we 
considered the survival time as the response variable Z, age as the first component of X and the 
square of age as the second component. Patients alive beyond February 1980 were considered 
censored. For easier comparison to previous work on this data set, we concentrate our analysis 
on the 157 patients out of 184 who had complete tissue typing. Among these 157 cases, 55 were 
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censored. 

Several methods of estimation have already been applied to this data set to estimate the 
following regression model, 



Z = a + p'X + £{X), 



(14) 



where (3 = (/?i,/32)', E[£{X)\X] = 0, see Miller and HalpernI mm . Wei et al. (1990), Stute et 
al. (2000). Furthermore, nonparametric lack-of-fi t tests have shown that t he regression model 



Lopez and Patilea 



20081 ) . Therefore it seems 



(fT4l) seemed reasonable, see Stute ei al. (2000) and[ 
to us appropriate to experiment our model on this data set. This strengthens the assumption 
on the residual, by assuming that e{X) = £{6qX), where 9q = (l,/32//3i)', but allows more 
flexibility on the regression function. 

In the following table, we present our estimators and recall the values of the estimators of 
132/ j3i for the linear regression model (fT^ . We first computed 9°°, which is our estimator using 
the whole data set, that is with r = +00, and compared it to the one obtained by choosing r 
from the data as in section 14.11 In this last case, f = .^'(go) where Z(j) denotes the i— th order 
statistic, this means that it conducted us to remove the 67 largest observations to estimate ^0 
(but not to estimate Kaplan-Meier weights, which were computed using the whole data set). 
We computed Weight°° = 0.0397, and Weight^ = 0.0076 for the truncated data. Adaptive 
bandwidth was 1.7 for 6°° , and 1.3 for 6'^ . The estimated value of the mean-squared error was 
El^ = 0.1089375 and E^ = 0.01212701 for 6°° and ^^ respectively. 





Estimator of ^0,2 = /?2//?i 


Miller and Halpern 


-0.01588785 


Wei et al. 


63.75 


Stute et al. 


-0.01367034 


6°° (without adaptive choice of r) 


-0.07351351 


9'^ (with adaptive choice of r) 


-0.0421508 



Table 6: Comparison of different estimators of ^o,: 



Our estimators seem relatively close to the ones obtained by iMiller and HalpernI (|l982l ) and 
Stute et al. (2000) using respectively the Buckley-James method and the Kaplan-Meier integrals 
method for the linear regression model. 



1^ 



5 Proof of Main Lemma 



First, the same arguments as in Delecroix et al. (2006) apply to replace Jq by Jq. Define 
Jq{x^c) = 1^ (e'x)>c- From Assumption [3] on the density of 6'x, deduce that, on shrinking 
neighbourhoods of ^O) Jq{x,c) can be replaced by J6i(x,c/2). Using a Taylor expansion, write 



fh 



Ll{e, /^'^ Jo) - Ll{e, r, Jo) = Yl ^^W^nlz,eA. log t.^^^'y^' MX,, c) 



i=l 



fUZi,e'x,) 



« 5iW,nlz,eAr [fl;'^iZ„e'X,) - f^{Z„e'X,)) Jo{Xi,c) 



i=\ 



fS{Zi,9'X,) 



" 6iW,nlz,^A. [fe'"{Zu e'Xi) - f-{Z,,e'Xi)\ MXi,c) 

^ (t>{fS{Zi,e'x,)j;;^^{Zi,e'Xi))^ 



i=l 



where 0(/g"(Z„^'XO, /^'^'"(Z,, 0%)) is between f^'^{Z^,e'Xi) and fJ{Zi,9'Xi). 

Step 1. We first study Ain- A Taylor expansion leads to the following decomposition. 






^,^ .».,.u..(v.i^r(^,^)-v.^,^.^,)^,;../.) 



j=i 



+ (9 - 9o)' 



E 

.1=1 



U{Z,,6'X,) 
5^Wintz,eA^{ylff\Z,,Xi) - VJfT{Zi,Xi))Je{Xi,c/2) 

2fsiz~¥x~) 



(0 - eo) 



^ 1 - SiW^lz^eA. {fe;^{Z,, e'^X,) - reSZ.XX^)) 



■ j^;^ fe{Zi,0'Xi)fl^{Zi,0'QXi) 

X (/,;(Z„ e'^X,) - fSiZi, e'X,))Jo{Xi, c)Je{X„c/2) + AUOo, f^^ 

= A1J0O, M + {0- ^o)'^L(^o, /"'^) + {e- e^)'AUl M{e - e,) + ai^{9, M. 

for some 9 between 9 and ^o- Observe that, using the uniform consistency of Vgfg'^ (deduced 
from Proposition and Lemma[7|), we obtain sup^^g, ^^^ f^^^ A'J^^{9, f^''^) = op(l). We now 
study A2n{9o, f^'^)- Using the expression ([3]) of the jumps of Kaplan-Meier estimator, observe 
that 

h,T\ 



Auo,r'i 



W*lz,^A. {Vefe^\Z,,Xi) - V,4(Zi, X^)) Je(X„ c/2) 



E 

i=l 



+ -Y^WtZc{Z,-) 



fS{Zi,9'X,) 
5ilz,eA^ {Vefg;;{Zi,Xi) - Vefl^{Z,,X,))Je{Xi, c/2) 



j=i 



fUZi,9'X,) 



Al,^{9,ri+Al,^{9J 



h,T\ 
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where 



Zcit) 



G{t) - Git) 



1 - G{t) 
The term A22n '^^^ be bounded using pTI) . ([22]) and Lemma [7l by 

n 

sup |A22n(e,/'^'OI < op(n-^/2) X n-'Y,^^['^ " G(Zi-)]-\ 

r<ro .^^ 

and the last term is Op(l) since it has finite expectation. Now for A^i^, first replace 6 at the 
denominator by Oq. We have 



Al,n{0,M = Yl 



" W,*lz,eA^(Ve/,';'"(Z,, X,) - V,/,;(Z,, Xi)) Jo(Xi, c/4) 



+ i?;(0,/i)(0-0o) 



/eo(-2'i)^0^i 



with supggQ^ ^<^g /jg-^^ |i?^(6',/i)| = op(l) from Assumption [3] and the uniform consistency of 
Vq/q '^ deduced from Proposition E] and Lemma [71 Then use Ass umption [71 and Proposition 



M Us ing the equicontinuity p roperty of Donsker classes (see e.g. 



Van der Vaart and Wellner 



(119961) or 



Van der VaartI |l998l )). we obtain that 



AuoJ'^n 



Vefe;^{y,x)-VefS,{y,x) lyeA.Jo{x,c/A)clF{x,y) 



fk^y^^) 



+ op{n-y^), 



where the op— rate does not depend on 9, h, nor r. From classical kernel arguments, 
suP3/,x,Tl/(Ve/0o^'^(y,2;) -Vef0^{y,x))ly(zArJo{x,c/4:)dF{x,y)\ = Op(/i^) = op(n~^/2)^ gjj^gg 
nh^ -^ 0. Then, Lemma [HI concludes the proof for j42„(6', /'''^). AJ^(0, /'^''^) can be handled 
similarly. 

Step 2. Blj^ can be rewritten as 



n 



{{0 - eoy[Vefe;\Zi,Xi) - VefS,iZi,Xi)]}^ 



i=l 



n 

+ 2 Y, ^iWinJeiX^, c/2)lz,6A. [f^;^{Z,,e'QXi) - re^{Zi, O'^X,)] 



i=l 



eoy[Ven'^{Zi,Xi) - VefUZi,Xi)] msiZi, e'Xi), C^{Zi,e'Xi)) 



:,iv ^^2^-l 



+Bueo,r'n+op{\\e-9or) 

: (9 - 9o)'BUeo, M{0 - 00) + {e- 9o)'BUO, M + BUOo, M + op{\\9 



%f) 
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for some 9 between 6 and Oq. The third term does not depend on 9. For 62^, nse the uniform 
consistency of V^o/g'^ (Proposition [6] and Lemma [7]) to obtain sup^<^,j /jg^^ |i?2n(^' /^'^)l — 
op{n~^''^). Finally, for B^^{9, f^''^), from a Taylor expansion, 

' i^^c^{r,{Zi,9'X,)J^^\Zi,9>Xi)Y 

+ [9-9^)'Rl{9,M, 



with sVi^0izQ^^r<To,h&nn ^n{^ ^ J'^) = cip(l), from Proposition [6] and Lemma [7l For the main 
term, the product of the uniform convergence rates of /g '^ and Vq/q'^ obtained from Proposition 
[Hand Lemma [7] is op{n~^''^) for h G Tin- 



6 Conclusion 

We proposed a new estimation procedure of a conditional density under a single-index assump- 
tion and random censoring. This procedure is an extension of the approach of Delecroix et al. 
(2003) in the case of a censored response. One of the advantage of this model is that it relies 
on fewer assumptions as a Cox regression model, in the case where the random variables of the 
model are absolutely continuous. By showing that estimating in this semiparametric model is 
asymptotically equivalent to estimating in a parametric one (unknown in practice), we obtain 
a n~^'^— rate for the estimator of the index. This estimator can then be used to estimate the 
conditional density or the conditional distribution function by using traditional nonparametric 
estimator under censoring. A new feature of our procedure, is that it provides an adaptively 
driven choice of the bandwidth involved in the kernel estimators we used, and that it also pro- 
vides an adaptive choice of a truncation parameter needed to avoid problems caused by the bad 
behavior of Kaplan-Meier estimators in the tail of the distribution. In this specific problem, this 
truncation does not introduce some additional bias in the procedure, and seems, according to 
our simulations, to increase the quality of the estimator, especially in the case where the pro- 
portion of censored responses is important. Our way of choosing r was motivated by minimizing 
the MSE in the estimation of 9. However, our method could be easily adapted to other kinds of 
criteria which, for example more focus on the error in estimating one specific direction, or on 
the error in the estimation of the conditional density itself. 
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7 Appendix 

7.1 Kaplan-Meier integrals for the parametric case 

We fir s t reca ll a cla ssical asymptotic representation of integrals with respect to F. See 



Stute 



{mm, 



Stute 



19961 ) and Sanchez Sellero et al. (2005). 
Theorem 4. Let T he a VC— class of functions with envelope $ such as 

$(x,y) = 0, for ally > To, (15) 

where tq < th- We have the following asymptotic i.i.d. representation, for all (p £ J^, 

r 1 " 

/ cP{x,y)dF{x,y) = -Y,HZr,6i,Xi;<P) + R{<P), 

i=l 

where sup ^^■p\R{(j))\ = Oa,s.([logn]'^ri~-'^), and 

,,^ ^ ^ ^. 5<P{X,,Z,) , ^/;°/;,0(x,t)dF(x,t)dMf(y) 
'ip[Zi,di,Xi;(l)) = — — — - + 



1 - G{Z,-) 'J 1 - H{y) 

where M^{y) = (1 — 5i)\zi<y — fl^ ^Zi>t[^ — G{t—)]~^dG{t) is a martingale with respect to the 
filtration Gy = {{Zi,6i,Xi)lzi<y}- Define A{<f>) = Var (ip{Z, 6, X ;(!))). Then it follows that 

Ri f (l){x,y)d[F - F]{x,y) ^ Af{0,A{(l))). 



Ini tially, the result of Stute was derived for a single function (j). Furthermore, Theorem 1.1 in 



Stutd (119961 ) gives a convergence rate which is only op{n~^'^) for the remainder term, however 
an higher convergence rate is obtained in his proof of Theorem 1.1 for functions satisfying ([TSl) . 
which is the only case considered in our work. To obtain uniformity on a FC— class of functions, 
see Sanchez Sellero et al. (2005) who provided a more general representation that extends the 
one of Stute in the case where Y is right-censored and left-truncated. Their result is really 
useful since it provides, as a corollary, uniform law of large numbers results and uniform central 
limit theorem. The representation we present in our Theorem [3] is a simple rewriting of Stute's 
representation. Theorem [4] is then a key ingredient to prove Lemma [2l 

Proof of Lemma O We directly show the second part of the Lemma, since the first can be studied 
from similar techniques. From a Taylor expansion. 



Li{e,f\Jo) = {e - eo)'Y.5iWinJo{Xi,c)^z^^A.— 



'^ef0^{Zi,Xi 



f"ea{Zi,0'QXi) 

+ \{G - eo)'Y,6,WinUXi,c)iz,^AM[\ogfi]{Zi,Xi){e - Oo 



n 



2 



+ T^nieo), (16) 
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for some 6 between 6*0 and 9. Theorem |4] provides an i.i.d. representation for the first term 
(which corresponds to Wn^r in Lemma [2]), while, from Assumption El the family of functions 
Vg[Iog /J](y,x)lygAT is a FC— class of functions satisfying (fTSl) . Hence the sum in the second 
term of (fTBl) tends to V almost surely using an uniform law of large numbers property. D 



7.2 The gradient of / 

In the following for any function ip we will denote by V^Jf (•) the expression h~"'ip^'^>{-/h) such 
as, for example i^^(-) = h~^K' {'/h) ■ 



Proposition 5. Let 



We have 



/r(y>^) = dufL{v,u)- 



^efeoiv'^x) = xfi,y,T{y',Oox) + /2,y,r(y',^( 



0-^A 



■with 



/2,r(y,^o^) = -fUy,o'ox)E [x\e'oX] . 

In particular, E[Vgf^^^(Y,X)\e'QX] = 0. 

Proof. Direct adaptation of Lemma 5A in iDominitz and Sherman 



(120051) 



D 



7.3 Convergence properties of f*^''^ 

We first recall some classical properties on kernel estimators. Consider the class of functions /C 
introduced in Assumption [D Let N{e,JC,dQ) be the minimal number of balls {g : dQ{g,g') < e} 
of dg-radius e needed to cover /C. For e > 0, let N{e,IC) = sup^ A^(«;e, /C, dg), where the 
supremum is taken over all probability measures Q on {W^,B), dq is the L2(Q)-metric. From 
Nolan and PollardI (119871 ) . it can easily be seen that, using a kernel K satisfying Assumption (H 
for some C > and i/ > 0, N{e, /C) < Ce-"", < e < 1. 

Proposition 6. Under assumptionUl we have, for some c > 



sup 

x,y,h,T 



ft'^{yXx) - fe^yXx) l,eA.Jo(x,c) = Op (n-i/2/,-i[iog^]i/: 



sup 

x,y,h,T 

sup 

x,y,h,T,6 



Vgf;^^^iy,x) - Vefeoiy^^) lyGA.Mx,c) = Op (n~'/^h~^[\ogn]'/^ 



Vlf*e^'^[y,x) - VlfUv.x) ly^Aje{x,c) = Op {n-^/^h-\\ogn]^' 



(17) 
(18) 
(19) 
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Proof. (fT7|l is a direct application of Theorems 1 and 4 in lEinmahl and MasonI (120051 ) . For (fTSl) , 
we only show the convergence for the term 






1 " 



j=i 



Define 



and 



'eo 



feY(a;,y) = -^E [ly^^, Jo(x,c)(X - x)K;(X'0o - x'eo)Kh{Y - y) 



{x,y) = ^ {e [{X - xW^X = u,Y = y]ly^A^Ux,c)fe'^^x.Yiuy)] 



Note t hat, from our assumptions rl is a finite quantity. Next, Theorem 4 in 



Einmahl and Mason 



(120051) yields : 



sup 

x,y,h,T 



f,Y(x,2/) - f,Y(x,y) ly^A. = Op{n-'/^h-^[logn]^/^). 



'or th e bias term, sup^, ^ /^ 



y,n.,T 



re'n{x,y) - rUx,y) lyg^, = ^(/i^) = o(n ^/^), (see e 



Bosq and Lecoutre 



19971)). As a consequence. 



sup 

x,y,h,T 



f^;:ix,y) - rl{x,y) l.^A. = Opin-'/^h-'[logn]'/^). 



For (|T9]) . we also need an uniformity with respect to 9. The result can be deduced from the 
uniform convergence (with respect to 9, x, u) of quantities such as 



1 

Sy{9,x,y,l3) = -^Y.^iW:(p{ZuXu9)vlK 



i=l 



rx, - 9' 



h 



x\ ^l Z,-y 



h 



(20) 



where VqK{[9' Xi — 9'x]h~^) for /3 = 1 (resp. for (5 = 2) denotes the gradient vector of function 
K{\9'Xi — 9'x\/h) (resp. Hessian matrix) with respect to 9 and evaluated at 0, and where (j) 
is a bounded function with respect to 6 and x. The function (j) we consider is 4'{Z,X,6) = 
fg,^{9'X)''^tzeArJo{x,c) with the convention 0/0 = and where f^iyjO' X) is the conditional 



Einmahl and Mason 



densit y of 9'X given Y £ Ar- ([20]) can be studied using the same method as 
2OO5I). For this, observe that the family of functions {{X,Z) -^ V^K(\9'X - 9'x]h-^)K 



y]h ^),9 £ Q,x,y } satisfies the Assumption s of Proposition 1 in 



Einmahl and Mason 



(Cmma 22 (ii) in 



19941 ). see also 



Nolan and PollardI ( 



3 



Einmahl and MasonI (|2005l )) to obtain that 



19871 ') '1. Hence, apply Talagrand's inequality (JTalagrand 



i[Z 



200,51) (see 



sup I S'^ 



h.T / 



i,x,y,a) 



9,x,y,h,T 

Again, the bias term converges uniformly at rate 0{h'^) 



E[St^^{9,x,y,a)]\\y^A. = Op{n-^l\\ogn]^/^h-^-P) 



D 
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7.4 The difFerence between /* and / 

7.4.1 Convergence rate of / 

In this section, we show that replacing f*^''^ by /^'"^ (which is the estimator used in practice) 
does not modify the rate of convergence. To give the intuition of this results, observe that f*'^^'^ 
was obtained from /'^''^ by replacing G by G. Let us recall some convergence properties of G. 
We have 



See 



sup\G{t)-G{t)\ =Op{n~^'^) 

t<TO 

l-G{t) 
sup = Up ( 1 ) . 

t<ro 1 - G{t) 



(21) 
(22) 



Gilll (|l983l ) for ((2T|) andlZhou| (|l992l ) for ((22|). From ((2T]), we see that the convergence rate of 
G is faster than the convergence rate of f*^'^, which explains the asymptotic equivalence of /''''^ 
and Z*'^''^. Lemma [7] makes things more precise and also gives a representation of the difference 
between Vgfg '^ and Vgfg'^ which is needed in the proof of Main Lemma. Also required to 
prove our Main Lemma, Lemma[8]below gives a technical result on the integral of this difference. 



sup 

x,y,h,T 



sup 

x,y,h,T 

sup 

x,y,h,T,6 



Lemma 7. Under the Assumption of Lemma\B, we have for some c > 

f^;;{y,e'x) - f;^/{y,e'x)\ l,eA.Jo(x,c) = Op(n-V2), 

Vefe;^{y,x) - Veft/{y,x)\ ty^A.Mx.c) = Op[n-^'^h-^), 

Vjf^'^iy,x) - VUt''{y,x)\ ly^A^Jeix,c) = Op{n-^l^h~^). 

Furthermore, for x such as Jo{x,c) / 0, 

I;, r 9'}::,y{x2,y2)dF{x,, y2)dMG{t) 



(23) 
(24) 
(25) 



Ve/g' (y,x)- Ve/g ' {y,x) 



do 



1 - H{t) 



Rn{T,h,x,y), 



(26) 



where M^{y) = n ^ ^21=1 ^^F iv) ^ ^^F *^ defined in Theorem\^ sup^, j^,^/j |i?„(r, /i, x,y)| 
Op((logn)-^/^n~^/i~^) and gh ^ is defined by 



n,T , N '^{Xl- X2)K'y^{Q'QXx - 6QX2)Kh{yi - 2/2 
9f,^,,y,[X2,y2) = J^ 



h,T 



KhiO'oXi - e'^X2)Kh{yi - y2)f'elx{0'oxi) 

where f'gJx denotes the derivative of u ^ /J, ^{u), the conditional density of 9'qX given Y G A^-. 

25 



Lemma 8. Under the Assumptions of Lem,ma\^ 



sup 

h,T 



r*h 



[Veftiy,x) - Vefr^iy,x)]ly^A^Jo{x,c/A)dF{x,y) 



op(n-V2). 



Proof of Lemma\^ To prove (l23H25p . we only prove ([25]) since the others are similar. To prove 

([25]) . we only consider the terms in which the second derivative is involved, the others being 

studied analogously. Consider 

1 '" 

■j^Y^^iWUXi - x)K';{e'Xi - 9'x)Kh{Zi - y){X, - x)'lz,eA./e'x(^'^)"' 

1 " 



j=i 



1 " 
+ j^Y.^i^^^G{Zi-){Xi - x)K'^{e'X, - e'x)Kh{Z, - y){Xi - xyiz,^AJS'xi9'x)-\ 

i=l 

where the first term is contained in Vgfg '^, while the second can be bounded by 



1 V^At ^r.'nf (^'Xi-0'x \,,,,fZ,-y 
—^2^6^1z,<ro\K I ( -. 1 \K 



n. 



h 



h 



ShermanI (119941 ) . the term inside the brackets is Op(l) uniformly in x, y, 6 



Using the results of 
and h. 

Now, for the representation (OBI) , observe that 



^e[fe^^ - f*g^'^]{y,x) 



h-^ Y^ 5,W*Zg{Z,-){x - X,)KMx - e',X,)K,iy - Z,)f^,M^)-Hz,^A. 

i=\ 

n 

-Y,S^W*ZG{Z,-)Mx,c)K{9'oX,-9'ox)Kh{Z,-y)f;^^M^)f^,M^rHz,eA. 



i=l 



+ R'n{T,h,x,y), 



(27) 



with supj,,^^^,- \R'n{T,h,x,y)\ = Op (^n~^h~^''^[logn]^''^) , from the convergence rate of Zq (see 
([2T]) and ([22]) ) and the convergence rate of the denominator in ^ and its derivative, say 
ifgix ~ fe'x) ^^^ ^f'e'x ~ ffyx) ('which are of uniform rate Op (n"^/^/!"^/^ [log n]^/^) and 
Op (n~^/^/i~^/^[logn]^/^) from arguments similar as for the proofs of ((I71)-([l9|) and ([23]) -(l25l)). 
An i.i.d. representation of the main term in (l27l) can be deduc ed from Theorem |4] since the class 



t3^^,t 



{h gr'^ ,x,y,h} is a VC-class from 



Nolan and PollardI (119871 ) 



D 



Proof of LemmalM Observe that, from classical kernel arguments 



sup 

t 



h,T 



X2,t<y2<ro 



9'}:lyix2,y2)Mx, c/i)dFix2,y2)dFix, y) - E[Vefe,iY, X)UX, c/4)] 



0(/i^ 
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since K is of order 4. Prom the representation (f26l) in Lemma [7], 



c*h,T / 



Vefe^^y.x) - V,/;;'"(y,x)]Jo(x,c/4)(iP(x,y) 



1 - H{t)]-'E[Vere,{Y,X)MX,c/A)]dM'^{t) 
+ Al - H{t)r^ \ ff g''/;^Jx2,y2)Mx,c/4)dF{x2,y2)d¥{x,y) 

J iJ Jx2,t<y2<T0 



-E{VefS,{Y,X)MX,c/A)] 
+ / RniT,h,x,y)dF{x,y), 



dM^{t) 



(28) 



where the last term is op{n~^''^) uniformly in r and h. The first one is zero from Proposition [5] 
and since Jq only depends on O'qX. 
For the second, let 0n(ii h, r) = 
[1 - i^(t)]-H/L,,i<,,<.o %:;,(X2, 2/2) Jo(rE, c/4)dF(x2, y2)dP(x, y) - S[V,/,;(y, X) Jo(X, c/4)]}. 
Using the fact that Tin is of cardinality A;„, we have, for the second term in ([281) . 



P( sup 

\h&Hn 



^n{t,h,T)dM'-{t) 



> e ) < fen sup ] 



(t>n{t,h,T)dM'^{t) 



>e . 



Now apply Lenglart's inequality (see Lenglart (1977) or Theorem 3.4.1 in Fleming and Harring- 
ton, (1991)). This shows that, for all e > and all r/ > 0, 

2 
rG/ 



sup < / Mt,h,T)dM'-{t)\ >£' 

^T<S<TO I JO J J 



(29) 



As mentioned before, sup^ |0„(t, /i, r)| = 0{h^). From ([^ and condition on fc„ in Assumption 
[H the Lemma follows. D 

7.4.2 Donsker classes 

As stated in Assumption [7l to obtain a n~^'^— convergence of ^, we need the regression function 
(and its gradient) to be sufficiently regular. In the Lemma below, we first show that the classes 
of functions defined in Assumption [7] are Donsker, and that fg'^ also belongs to the same regular 
class as fj with probability tending to one. 

Proposition 9. Consider the classes Tii and Ti2 defined in Assumption [3 Tii and Ti2 are 
Donsker classes. Furthermore, /g '^ andVefg'^ belong respectively toTCi andTC2 with probability 
tending to one for some constant M sufficiently large. 
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Proof. The class TCi is Donsker from Corollary 2.7.4 in I Van der Vaart and Wellneij p996l ). The 
class 7^2 is Donsker from a permanence pro perty of Donsker classes, see Examples 2.10.10 and 



2.10.7 in 



Van der Vaart and Wellner 



19961 ). We only show the proof for Vefg'^ , since the one 



for fn '^ is similar. Write 



^eft''{z,x) 



nh ^ 



-MXi,c/2) 



i=l 






i=l 



6ilz,eAAX^ - x)K'^%X, - e',x)KhiZ, - z)[f^,^^{e',x) - f^,^M^)] 
[I - G{Z,-)]t {e',x)fL^{0'ox) 



■MX„c/2) 



1 " (X, - x)K'f^{e'oXi - e'Qx)Jo{X„ c/2) 
1 y. 6^Kh%Xi - e'^x)Kh{Zi - z)lz,^A. 



i=\ 



[l-G'(Z-)] 



+ 



■ 1 " (^^ - x)K'^(e'^X, - e',x) [{f^,Mx)Y - (/4x(^o^))'] -^0(^^,0/2) 



1 y- 6^1z^eA j<hiOoX, - 9'^x)Kh{Zi 



i=l 



[^-G{Z,-)] 

From this expression, we clearly see that VQfg'^{y,x) = x(f)i{x'9o,y) + (l)2{x'6o,y). Now we 

must check that 01 and (f)2 are in 7ii with probability tending to one. Since the functions 

are twice continuously differentiable (from the assumptions on K), we only have to check their 

boundedness. Prom Lemma [7], this can be done at first by replacing /'''^ by f*'^^'^ (i.e. G by the 

true function G). Among the several terms in the decomposition of Vg/*'^''^, we will only study 

J_ y. 6,tz,eA.XiK'f^{e'oXi - u)Kh{Zi - z)Jo(Xi, c/2) 
nh ^ 



Hu,y) 



i=l 



[1 - G(Z,-)]/-^(n) 



since the others are similar. We will show that the derivatives of order 0, 1 and 1 + 5 of this 
function are uniformly bounded by some constant M with probability tending to one. 



Einmahl and Mason 



Now a centered version of i;^ converges to zero at rate Op ([log n]^/^n ^'^h ^) (see[ 
(|2005l )). which tends to zero as long as nh^ — > cxo. Furthermore, E[(j)] is uniformly bounded from 



our Assumption [7] on the regression function. For the derivative. 



du<P{u,y) 



1 A 5ilz^<^A^XiK'l{9'^X, - u)Kh{Zi - z)Jo(Xi, c/4) 



—y 

nh ^-^ 



nh 



1=1 



[l-G{Z,-)]n, 



u 



1 J^ 6ilz,eA^X,K'^{e'oXi - u)Kh{Zi - z)Jo{X,, c/4)/^T 



x^ 



nh 



1=1 



[i-G{Zi-)]{a^iu)y 
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Again , E[du4>] is uniformly bounded from our Assumption[7l Now using the results of lEinmahl and Mason 
(|2005l ). the centered version of du4> tends to zero provided that nh^ -^ oo. The same argu- 
ments apply for dyCp. Hence, with fi{u,y) = E{(pi{u,y)) we proved that sup^^ y\didy(j)i{u,y) — 
di,dyfi{u,y)\ tends to zero in probability for i = 1,2, k + j < I. Now we have to show that 
du4>j and dyCpj are 6— Holder for j = 1,2 with an Holderian constant bounded by some M with 
probability tending to one. We only prove the result for du(pi- We have 



\du(t)i{u,y) -du(f>i{u',y')\ I \du4>iiu,y) -du(t)2{u',y')\ 
— — — — ^ = max sup 

u',y',x,y IK^i yj ~ l"" >y Jll \\u~u'\>n''^,y,y' 



■i',y',x,y \\{u,y)-{u',y')\\^ \\u~u'\>n'\y,v' \\{u,y) - (.^ ,y')\\^ 



sup 



\du4>i{u,y) -du<l)i{u',y')\ 



We have 



5*1 < sup 



\u-u'\'<n-\y,y' \\{u,y) - {u',y')\\^ 

= max(S'i, 52). 
\dufiiu',y') -duh{u,y)\ 



\\{u',y')-{u,y)\\^ 
+ 2n^ sup \du(t)i{u,y) - dufi{u,y)\. 

u,y,u',y' 

Prom our Assumptions, the first supremum is bounded, while the last is 

Op{n~^'^~^^[logn]^''^h~^) from the convergence rate of du(t>2- It tends to zero provided that 
nh^~^ — > 00. For ^2, since K is C^ with bounded derivatives, for some positive constant M, 

^^p \dMn,y) -j^^M<y')\ < M X Op(i)|| V |K(-)|||.. 

\\{u,y)-{u',y')\\<n-\y,y' II (",?/)- K, 2/') 11 ^ 

1 " f) 

sup \\(u,y)-{u',y')\\'~'h-'—J2 , '(7 V 

The last supremum is bounded by Op(l) x n~^'^ h~^ , and it tends to zero when nb^ — > cxd (and 
the Op(l) term does not depend on u,y). D 
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